Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix interface bind #4510

Closed
wants to merge 2 commits into from

Conversation

alexpyattaev
Copy link

Problem

There is a bunch of code in net-utils that does not obey bind-address CLI arg when called from agave-validator.
This is part of a bigger problem as outlined in #4440

Summary of Changes

  • Switched implementation in net-utils towards tokio (std TCP sockets do not support BINDTODEVICE)
  • Updated callsites in validator main to pass bind-address as needed

Fixes #

@bw-solana
Copy link

It would be helpful as a reviewer to make this into a few distinct PRs (or commits that could be reviewed serially):

  1. Tokio RT
  2. Passing around optional bind addr
  3. Some abstraction cleanup (e.g. do_make_request)

@bw-solana bw-solana requested a review from gregcusack January 16, 2025 23:48
Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya agree with Brennan. Would be nice if you broke this up into three PRs.

  1. Switch to tokio
  2. deprecate methods and add new ones (change to code to use ones)
  3. helper methods.

General question though: why are we using tokio here?

Comment on lines +118 to 149
pub fn get_public_ip_addr(
ip_echo_server_addr: &SocketAddr,
bind_address: Option<IpAddr>,
) -> anyhow::Result<IpAddr> {
let fut = ip_echo_server_request(
*ip_echo_server_addr,
IpEchoServerMessage::default(),
bind_address,
);
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()?;
let resp = rt.block_on(fut)?;
Ok(resp.address)
}

pub fn get_cluster_shred_version(ip_echo_server_addr: &SocketAddr) -> Result<u16, String> {
let resp = ip_echo_server_request(ip_echo_server_addr, IpEchoServerMessage::default())?;
pub fn get_cluster_shred_version(
ip_echo_server_addr: &SocketAddr,
bind_address: Option<IpAddr>,
) -> anyhow::Result<u16> {
let fut = ip_echo_server_request(
*ip_echo_server_addr,
IpEchoServerMessage::default(),
bind_address,
);
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()?;
let resp = rt.block_on(fut)?;
resp.shred_version
.ok_or_else(|| String::from("IP echo server does not return a shred-version"))
.ok_or_else(|| anyhow!("IP echo server does not return a shred-version"))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest you deprecate these functions first so it doesn't break downstream users of net-utils. Steps: deprecate them, rewrite the new functions with the new bind_address option, change code to use the new functions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I figured noone outside of anza is using these anyway. But if you think we must then we can do the whole deprecation dance as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya to be honest, I do not know if anyone is using these outside Anza. But if they are, good to not break the interface.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's sometimes hard to say: https://crates.io/crates/solana-net-utils

we've been slapped on the wrist for breaking public interfaces too often, so we've been getting stricter on this.

If we need to break compatibility or have a really compelling reason to, we can always ask Jacob/Nick to understand the downstream impact

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure we can keep old functions as they are, and add new ones that bind to interface. We do not always need to bind, so old ones will still be used.

let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.expect("Can not create a runtime");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.expect() error messages should be styled as "expect as precondition". see here: https://doc.rust-lang.org/std/error/index.html#common-message-styles. Basically formatted as "tokio builder should create a runtime" or something

pub fn get_public_ip_addr(
ip_echo_server_addr: &SocketAddr,
bind_address: Option<IpAddr>,
) -> anyhow::Result<IpAddr> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would maybe just recommend staying with io::Result instead of changing to anyhow::Result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code did not report specific errors (it reported just String) so I have stuck with the convention. io::Result can not encode all ways this stuff can fail anyways, at least not cleanly. Since we are not handling these errors with match I see no point making life difficult.

@alexpyattaev
Copy link
Author

General question though: why are we using tokio here?
Tokio is used because
a) it can bind outgoing connections to the interfaces (and std:: sockets can not)
b) it can do all the same work as the original code without spawning unnecessary threads
c) Also tokio does timeouts in a much more clean way.

I will close this PR and do the following three instead:

  1. rewrite of net-utils internals (switch to tokio everywhere, remove junk etc).
  2. Change of the API and addition of the functions that also accept bind address
  3. Patching the code in validator main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants